The Nearest Feature Midpoint - A Novel Approach for Pattern Classification

نویسندگان

  • Zonglin Zhou
  • Chee Keong Kwoh
چکیده

In this paper, we propose a method, called the nearest feature midpoint (NFM), for pattern classification. Any pair of feature points of the same class is generalized by the feature midpoint (FM) between them. Hence the representational capacity of available prototypes can be expanded. The classification is determined by the nearest distance from the query feature point to each FM. This paper compares the NFM classifier against the nearest feature line (NFL) classifier, which has reported successes in various applications. In the NFL, any pair of feature points of the same class is generalized by the feature line (FL) passing through them, and the classification is evaluated on the nearest distance from the query feature point to each FL. The NFM can be considered to be the refinement of the NFL. A theoretical proof is provided in this paper to show that for the n-dimensional Gaussian distribution, the classification based on the NFM distance metric will achieve the least error probability as compared to those based on any other points on the feature lines. Furthermore, a theoretical investigation is provided that under certain assumption the NFL is approximately equivalent to the NFM when the dimension of the feature space is high. The experimental evaluations on both simulated and real-life benchmark data concur with all the theoretical investigations, as well as indicate that the NFM is effective for the classification of the data with a Gaussian distribution or with a distribution that can be reasonably approximated by a Gaussian.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005